Backward Error Recovery in Redundant Disk Arrays (CMU-CS-94-193)

نویسندگان

  • William V. Courtright
  • Garth A. Gibson
چکیده

Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not found in nonredundant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk systems is to transition directly from an erroneous state to completion. This technique, known as forward error recovery, relies upon the context in which an error occurs to determine the steps required to reach completion, which implies forward error recovery is design specific. Forward error recovery requires the enumeration of all erroneous states the system may reach and the construction of a forward path from each erroneous state. We propose a method of error recovery which does not rely upon the enumeration of erroneous states or the context in which errors occur. When an error is encountered, we advocate mechanized recovery to an error-free state from which an operation may be retried. Using a form of backward error recovery, we are able to manage the complexity of error recovery in redundant disk arrays without sacrificing performance. Proceedings of the 1994 Computer Measurement Group Conference (CMG), Orlando FL, Vol. 1, December 4-9, 1994, pp. 63-74. To appear in Proceedings of the 1994 Computer Measurement Group Conference (CMG) Page 2 of 12 Figure 1 RAID Levels 0, 1, 3, and 5. This figure depicts the data layout and redundancy organizations for the most prevalent RAID levels using an array of four disks. Data units represent the unit of data access supported by the array. Parity units represent redundancy information generated from the bitwise exclusive-or (parity) of a collection of data units. The redundancy group formed by a parity unit and the data units it protects is commonly known as a parity group. RAID level 0 offers no redundancy so it is not single fault tolerant. In this illustration, data is block interleaved meaning that the array is optimized for small transfer sizes with each request being serviced by a single drive, increasing throughput. Data may also be bit interleaved, optimizing performance for large transfer sizes by using every drive to transfer data in parallel, increasing bandwidth. RAID level 1 offers the simplest form of redundancy, maintaining two copies of each data unit, each stored on a different disk. Also referred to as mirroring, this method of redundancy has the highest capacity overhead of the RAID architectures: 50% of disk space is consumed by redundant information. Because the redundant information is a simple copy and both copies reside on independent disks, read operations can be serviced by either copy, making balancing the read workload across the array easier. RAID level 3 is bit interleaved to optimize bandwidth and relies on parity-based redundancy to reduce capacity overhead. Redundancy is maintained on a single drive, the parity drive. Parity is calculated as the bitwise exclusive-or of the data drives. RAID level 5 is block interleaved to optimize throughput and uses parity-based redundancy. Both data and parity are evenly distributed throughout the array. A variety of strategies exist to evenly distribute data units and parity units; this illustration uses the left-symmetric layout [Lee90]. Parity is the bitwise exclusive-or of all data units in the parity group. The remaining RAID levels, 2 and 4, are not shown. RAID level 2 employs Hamming codes and does not rely on a disk’s ECC logic to report errors. Since commodity drives all have sufficient ECC logic, the space and complexity overhead of Hamming codes is generally considered to be unwarranted. RAID level 4 is similar to RAID level 5 but does not evenly distribute the parity, creating “hot spots” in the array [Patterson88]. D3 D7 D4 D5 D6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Backward Error Recovery in Redundant Disk Arrays

Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not found in nonredundant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk systems is to transition directly from an erroneous state to completion. This technique, known as forwar...

متن کامل

A Structured Approach to Redundant Disk Array Implementation (CMU-CS-96-137)

Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific code which limits extensibility and is difficult to verify. In this paper, we describe a technique for automating the execution of redundant disk array operations, including recovery from errors, independent of array architecture. Our approach employs a graphical representation o...

متن کامل

A Structured Approach to Redundant Disk Array Implementation

Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific code which limits extensibility and is difficult to verify. In this paper, we describe a technique for automating the execution of redundant disk array operations, including recovery from errors, independent of array architecture. Our approach employs a graphical representation o...

متن کامل

On-Line Data Reconstruction in Redundant Disk Arrays (CMU-CS-94-164)

To meet the bandwidth needs of modern computer systems, parallel storage systems are evolving beyond RAID levels 1 through 5. The Parallel Data Lab at Carnegie Mellon University has constructed three Scotch parallel storage testbeds to explore and evaluate five directions in RAID evolution: first, the development of new RAID architectures to reduce the cost/performance penalty of maintaining re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015